The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
学习者语料库收集L2学习者产生的语言数据,即第二或外语学习者。这种资源与第二语言采集研究,外语教学和自动语法纠错有关。但是,几乎没有焦点汉语作为外语(CFL)学习者的学习者语料库。因此,我们建议构建大规模的多维注释的中国学习者语料库。要构建语料库,我们首先获得CFL学习者生成的大量富有的富主题文本。然后我们设计一个注释方案,包括句子可接受性得分以及语法错误和基于流畅的校正。我们构建一个众群平台,有效地执行注释(https://yaclc.wenmind.net)。我们命名语料库yaclc(又一个中国学习者语料库)并将其释放为Cuge基准(http://cuge.baai.ac.cn)。通过分析语料库中的原始句子和注释,我们发现Yaclc具有相当大的尺寸和非常高的注释质量。我们希望这项语料库能够进一步加强中国国际教育和中国自动语法纠错的研究。
translated by 谷歌翻译
光保护综合技术的快速进展达到了真实和操纵图像之间的边界开始模糊的临界点。最近,一个由Mega-Scale Deep Face Forgery DataSet,由290万个图像组成和221,247个视频的伪造网络已被释放。它是迄今为止的数据规模,操纵(7个图像级别方法,8个视频级别方法),扰动(36个独立和更混合的扰动)和注释(630万个分类标签,290万操纵区域注释和221,247个时间伪造段标签)。本文报告了Forgerynet-Face Forgery Analysis挑战2021的方法和结果,它采用了伪造的基准。模型评估在私人测试集上执行离线。共有186名参加比赛的参与者,11名队伍提交了有效的提交。我们将分析排名排名的解决方案,并展示一些关于未来工作方向的讨论。
translated by 谷歌翻译
动态网络嵌入(DNE)最近引起了相当大的关注,因为网络嵌入了各种领域的网络和许多真实网络的动态性质。对于DNE的输入动态网络通常被认为对快照具有平滑的变化,但是不会对所有现实情景保持一致。询问现有的DNE方法是否可以对输入动态网络表现良好,而不会平滑变化是很自然的。为了量化它,建议索引称为更改程度(文档),以便较小的文档表示更平滑的变化。我们的比较研究表明,即使相应的输入动态网络来自同一数据集,几种DNE方法也不足够强大到不同的文档,这将使这些方法不可靠,并且难以用于未知的现实应用程序。为提出有效且更强大的DNE方法,我们遵循集合的概念,其中每个基础学习者采用增量跳过嵌入模型。为了进一步提高性能,简单但有效的策略旨在通过捕获不同级别的本地 - 全局拓扑来增强每个时间步骤的基本学习者之间的多样性。广泛的实验表明,与最先进的DNE方法相比,该方法的卓越有效性和稳健性,以及在所提出的方法及其可扩展性中的特殊设计的益处。
translated by 谷歌翻译
学习在动态环境中网络的低维拓扑表示由于许多真实网络的时间不断发展而引起了很多关注。动态网络嵌入(DNE)的主要和共同目标是有效更新节点嵌入品,同时在每次步骤保留网络拓扑时。大多数现有DNE方法的想法是捕获受影响的节点(而不是所有节点)的拓扑变化,并因此更新节点嵌入。遗憾的是,这种近似虽然可以提高效率,但是在每次步骤中不能有效地保留动态网络的全局拓扑,因为没有考虑通过高阶接近传播的累积拓扑变化的非活动子网。为了解决这一挑战,我们提出了一种新颖的节点选择策略,以在网络上多移地选择代表节点,这与基于Skip-gram的嵌入方法的新增量学习范例协调。广泛的实验显示Glodyne,较小的节点部分被选中,可以实现优越或相当的性能W.R.T.在三个典型的下游任务中最先进的DNE方法。特别是,Glodyne显着优于图形重建任务中的其他方法,这表明了其全球拓扑保存能力。源代码可在https://github.com/houchengbin/glodyne获得
translated by 谷歌翻译
Facial attribute editing aims to manipulate single or multiple attributes of a face image, i.e., to generate a new face with desired attributes while preserving other details. Recently, generative adversarial net (GAN) and encoder-decoder architecture are usually incorporated to handle this task with promising results. Based on the encoder-decoder architecture, facial attribute editing is achieved by decoding the latent representation of the given face conditioned on the desired attributes. Some existing methods attempt to establish an attributeindependent latent representation for further attribute editing. However, such attribute-independent constraint on the latent representation is excessive because it restricts the capacity of the latent representation and may result in information loss, leading to over-smooth and distorted generation. Instead of imposing constraints on the latent representation, in this work we apply an attribute classification constraint to the generated image to just guarantee the correct change of desired attributes, i.e., to "change what you want". Meanwhile, the reconstruction learning is introduced to preserve attribute-excluding details, in other words, to "only change what you want". Besides, the adversarial learning is employed for visually realistic editing. These three components cooperate with each other forming an effective framework for high quality facial attribute editing, referred as AttGAN. Furthermore, our method is also directly applicable for attribute intensity control and can be naturally extended for attribute style manipulation. Experiments on CelebA dataset show that our method outperforms the state-of-the-arts on realistic attribute editing with facial details well preserved.
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译